Skip to content

Conversation

Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Sep 30, 2025

This PR updates the VideoEncoder to support encoding for common video container formats, nearly identically to the FFmpeg CLI.

Changes:

Some changes are made to align with the design in #907:

Testing

  • test_video_encoder_round_trip: Ensures that a video's decoded frames are the same after encoding then decoding.
    • mov, mp4, mkv, webm
  • test_video_encoder_against_ffmpeg_cli: Ensures that the VideoEncoder frames are the same as the FFmpeg CLI.
    • mov, mp4, avi, mkv, webm, flv, gif

Testing caveats

  • The crf parameter is needed to test lossless encoding in the round trip test. For formats that do not support crf, the round trip test is not availablle.
  • When lossy encoding occurs due to codec + pixel format selection, assert_close is substituted by assert_tensor_close_on_at_least with a lower percentage match (96-99), and a higher atol (2-15).

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2025
@Dan-Flores Dan-Flores force-pushed the encoder_accuracy branch 3 times, most recently from f414d0b to c29dee3 Compare October 6, 2025 15:39
@Dan-Flores Dan-Flores marked this pull request as ready for review October 7, 2025 14:51
avCodecContext_->global_quality = FF_QP2LAMBDA * qp;
}
int status = avcodec_open2(avCodecContext_.get(), avCodec, &options);
av_dict_free(&options);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, the crf parameter is reused to set qscale to encode high quality videos in round-trip tests. But, the C++ function only allows crf to be set, not qscale. Since qscale is not needed anywhere else, I did not think it was worth including, but I am open to feedback here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context.

My understanding is that for those codecs that do not support crf, we set instead the qscale (quantizer scale) parameter. They both control encoding quality, but in different ways.

I think... we should avoid doing that. I don't have a good enough understanding of how these 2 parameters (and their values!) relate to each other, and I think we can punt on that for a first release of the encoder. Especially since we only really need this workaround for our round-trip test to run. It means we won't be able to do the run-trip tests on those formats, but that's OK:

  • those formats aren't that popular anyway
  • we should still be able to do the test against the FFmpeg CLI.

test/test_ops.py Outdated
Comment on lines 1367 to 1372
assert (
source_frames.shape == round_trip_frames.shape
), f"Shape mismatch: source {source_frames.shape} vs round_trip {round_trip_frames.shape}"
assert (
source_frames.dtype == round_trip_frames.dtype
), f"Dtype mismatch: source {source_frames.dtype} vs round_trip {round_trip_frames.dtype}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use plain assert like

assert source_frames.shape == round_trip_frames.shape

pytest will provide a proper error message

test/test_ops.py Outdated
Comment on lines 1381 to 1385
atol = 2
# Check that PSNR for decode(encode(samples)) is above 30
for s_frame, rt_frame in zip(source_frames, round_trip_frames):
res = psnr(s_frame, rt_frame)
assert res > 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it's not from this PR but let's clean that up a little:

  • the comment isn't needed as the code is really self explanatory (and it doesn't just do psnr validation anymore!)
  • no need to store res
Suggested change
atol = 2
# Check that PSNR for decode(encode(samples)) is above 30
for s_frame, rt_frame in zip(source_frames, round_trip_frames):
res = psnr(s_frame, rt_frame)
assert res > 30
atol = 2
for s_frame, rt_frame in zip(source_frames, round_trip_frames):
assert psnr(s_frame, rt_frame) > 30

assert_close(s_frame, rt_frame, atol=atol, rtol=0)

@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")
Copy link
Contributor

@NicolasHug NicolasHug Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's duplicated

Suggested change
@pytest.mark.skipif(in_fbcode(), reason="ffmpeg CLI not available")

test/test_ops.py Outdated
"Codec for webm is not available in the FFmpeg6/7 installation on Windows."
)
asset = TEST_SRC_2_720P
# Test that decode(encode(decode(asset))) == decode(asset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be at the top

test/test_ops.py Outdated
f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes())

ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}")
# Test that lossless encoding is identical
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this comment is needed here, it looks out of place


const AVPixelFormat* getSupportedPixelFormats(const AVCodec& avCodec) {
const AVPixelFormat* supportedPixelFormats = nullptr;
#if LIBAVCODEC_VERSION_INT >= AV_VERSION_INT(61, 13, 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment to specify which major FFmpeg version this correspond to

Comment on lines 797 to 799
// av_packet_rescale_ts ensures encoded frames have correct timestamps.
// This prevents "no more frames" errors when decoding encoded frames,
// https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pytorch/audio/blob/b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e/src/libtorio/ffmpeg/stream_writer/encoder.cpp#L46 links to

    if (packet->duration == 0 && codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
      // 1 means that 1 frame (in codec time base, which is the frame rate)
      // This has to be set before av_packet_rescale_ts bellow.
      packet->duration = 1;
    }

which seems to be about the lines just above.

Is this comment at the right place? Maybe it should be a few lines above - and it should also explain why we need to set duration to 1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the comment to be above the packet->duration and av_packet_rescale_ts lines and updated the text to reference it as "the code below". Let me know if I misunderstood your suggestion

UniqueEncodingAVFormatContext avFormatContext_;
UniqueAVCodecContext avCodecContext_;
int streamIndex_ = -1;
AVStream* avStream_;
Copy link
Contributor

@NicolasHug NicolasHug Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we now store avStream_ mostly because we need to access time_base? If that's the case, then let's get rid of the streamIndex_ field because it can now be accessed through avStream_

Copy link

meta-codesync bot commented Oct 10, 2025

@Dan-Flores has imported this pull request. If you are a Meta employee, you can view this in D84393092.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants